Discriminating Subsequence Discovery for Sequence Clustering
نویسندگان
چکیده
In this paper, we explore the discriminating subsequencebased clustering problem. First, several effective optimization techniques are proposed to accelerate the sequence mining process and a new algorithm, CONTOUR, is developed to efficiently and directly mine a subset of discriminating frequent subsequences which can be used to cluster the input sequences. Second, an accurate hierarchical clustering algorithm, SSC, is constructed based on the result of CONTOUR. The performance study evaluates the efficiency and scalability of CONTOUR, and the clustering quality of SSC.
منابع مشابه
Theorectical Analysis of Subsequence Time-Series Clustering from a Frequency-Analysis Viewpoint
Although Subsequence Time Series (STS) clustering is one of the most popular pattern discovery techniques from timeseries data, a mathematical methodology for analyzing STS clustering (or pattern discovery from time-series data) has attracted little attention. In the situation, it has had a surprising report [10] that cluster centers obtained using STS clustering closely resemble ”sine waves” w...
متن کاملHybrid Clustering Support Vector Machines by Incorporating Protein Residue Information for Protein Local Structure Prediction
Protein local structure prediction can be described as prediction of protein secondary structure from protein subsequence. This protein subsequence or also known as protein local structure covers fragments of the protein sequence. In fact, it is easier to identify the sequence-to-secondary structure relationship using protein subsequence rather than use the whole protein sequence. Further, this...
متن کاملA Review of Subsequence Time Series Clustering
Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequenc...
متن کاملSelective Subsequence Time Series clustering
0950-7051/$ see front matter 2012 Elsevier B.V. A http://dx.doi.org/10.1016/j.knosys.2012.04.022 ⇑ Corresponding author. Tel.: +66 8 9499 9400; fax E-mail addresses: [email protected] (S. Ro chula.ac.th (V. Niennattrakul), [email protected] Subsequence Time Series (STS) Clustering is a time series mining task used to discover clusters of interesting subsequences in time series data...
متن کاملFaster sequence homology searches by clustering subsequences
MOTIVATION Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. RESULTS We developed a fast homology search method based on database subsequence clusteri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007